Goto

Collaborating Authors

 visual structure constraint



Transductive Zero-Shot Learning with Visual Structure Constraint

Neural Information Processing Systems

To recognize objects of the unseen classes, most existing Zero-Shot Learning (ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, in real scenarios, the data distribution between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (\ie alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance,Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose a new training strategy to handle the real cases where many unrelated images exist in the test dataset, which is not considered in previous methods. Experiments on many widely used datasets demonstrate that the proposed visual structure constraint can bring substantial performance gain consistently and achieve state-of-the-art results.



Reviews: Transductive Zero-Shot Learning with Visual Structure Constraint

Neural Information Processing Systems

Strength: - The paper proposes an interesting and novel approach for transductive zero-shot learning. It would be great to also include zero shot performance on ImageNet (this is most likely missing as there are not attribute annotations for ImageNet, but the approach does not seem to be limited to attributes for transfer) 1.2. It would be interesting to quantitatively compare to [31] and [34] as ablations of the author's appraoch from which authors took inspiration. The authors claim in the reproducibility checklist to have "Clearly defined error bars" and "A description of results with central tendency (e.g. The paper misses to discuss (qualitatively and quantitatively) recent related work including [A].


Reviews: Transductive Zero-Shot Learning with Visual Structure Constraint

Neural Information Processing Systems

The submission originally received scores mixed region that put it into the borderline region. The reviewers praised the simple and apparently effective method, but also noted a number of issues, in particular an unclear relation to [34] (which itself is rather unclear) as well as an insufficient experiment evaluation. In their response the authors provided additional information and results, which the reviewers appreciated. A detailed discussion followed, that ultimately let to the conclusion that the contribution is valuable and that authors should not be punished for a lack of clarity in the prior work [34]. Therefore, the recommendation is to accept the work.


Transductive Zero-Shot Learning with Visual Structure Constraint

Neural Information Processing Systems

To recognize objects of the unseen classes, most existing Zero-Shot Learning (ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, in real scenarios, the data distribution between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (\ie alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance,Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose a new training strategy to handle the real cases where many unrelated images exist in the test dataset, which is not considered in previous methods. Experiments on many widely used datasets demonstrate that the proposed visual structure constraint can bring substantial performance gain consistently and achieve state-of-the-art results.


Transductive Zero-Shot Learning with Visual Structure Constraint

Wan, Ziyu, Chen, Dongdong, Li, Yan, Yan, Xingguang, Zhang, Junge, Yu, Yizhou, Liao, Jing

Neural Information Processing Systems

To recognize objects of the unseen classes, most existing Zero-Shot Learning (ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, in real scenarios, the data distribution between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (\ie alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance,Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose a new training strategy to handle the real cases where many unrelated images exist in the test dataset, which is not considered in previous methods.